Rethinking Spatiotemporal Feature Learning For Video Understanding
نویسندگان
چکیده
In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception architecture to 3D. We first consider “deflating” the I3D model at various levels to understand the role of 3D convolutions. Interestingly, we found that 3D convolutions at the top layers of the network contribute more than 3D convolutions at the bottom layers, while also being computationally more efficient. This indicates that I3D is better at capturing high-level temporal patterns than low-level motion signals. We also consider replacing 3D convolutions with spatiotemporal-separable 3D convolutions (i.e., replacing convolution using a kt×k×k filter with 1× k× k followed by kt× 1× 1 filters); we show that such a model, which we call S3D, is 1.5x more computationally efficient (in terms of FLOPS) than I3D, and achieves better accuracy. Finally, we explore spatiotemporal feature gating on top of S3D. The resulting model, which we call S3D-G, outperforms the state-of-the-art I3D model by 3.5% accuracy on Kinetics and reduces the FLOPS by 34%. It also achieves a new state-of-the-art performance when transferred to other action classification (UCF-101 and HMDB51) and detection (UCF-101 and JHMDB) datasets.
منابع مشابه
ConvNet Architecture Search for Spatiotemporal Feature Learning
Learning image representations with ConvNets by pretraining on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning. Although any image representation can be applied to video frames, a dedicated spatiotemporal representation is still vital in order to incorporate motion patterns that cannot be captured by appea...
متن کاملLearning spatiotemporal features by using independent component analysis with application to facial expression recognition
Engineered features have been heavily employed in computer vision. Recently, feature learning from unlabeled data for improving the performance of a given vision task has received increasing attention in both machine learning and computer vision. In this paper, we present using unlabeled video data to learn spatiotemporal features for video classification tasks. Specifically, we employ independ...
متن کاملLearning semantic features for action recognition via diffusion maps
Efficient modeling of actions is critical for recognizing human actions. Recently, bag of video words (BoVW) representation, in which features computed around spatiotemporal interest points are quantized into video words based on their appearance similarity, has been widely and successfully explored. The performance of this representation however, is highly sensitive to two main factors: the gr...
متن کاملThe Effect of Family Nursing Education Using Reflection Method with the Help of Situation Simulation Through Video Screening on Learning and Perspective of Nursing Students
Introduction: Reflection is one of the basic methods of education that is effective in raising the level of awareness and skills in clinical situations. The aim of this study was to investigate the effect of family nursing education using reflection method with the help of situation simulation through video screening on learning and perspective of nursing students. Methods: This quasi-experimen...
متن کاملA Conversation Analytic Study on the Teachers’ Management of Understanding-Check Question Sequences in EFL Classrooms
Teacher questions are claimed to be constitutive of classroom interaction because of their crucial role both in the construction of knowledge and the organization of classroom proceedings (Dalton Puffer, 2007). Most of previous research on teachers’ questions mainly focused on identifying and discovering different question types believed to be helpful in creating the opportunities for learners’...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.04851 شماره
صفحات -
تاریخ انتشار 2017